Kaggel Kernel zeigen: eda-to-prediction-dietanic

Quelle: https://www.kaggle.com/ash316/eda-to-prediction-dietanic/notebook

Cross-industry standard process for data mining Wikipedia: CRISP

busines understanding

data understanding

data preparation

modeling

evaluation

deployment

Part1: Exploratory Data Analysis(EDA):

1)Analysis of the features.

  • data=pd.read_csv('../input/train.csv')
  • data.head()
  • data.isnull().sum() #checking for total null values
  • How many Survived??
  • Types Of Features
    • Categorical Features in the dataset: Sex,Embarked
    • Ordinal Features in the dataset: PClass
    • Continous Features in the dataset: Age
  • Analysing The Features
    • Sex
      • This looks interesting. The number of men on the ship is lot more than the number of women. Still the number of women saved is almost twice the number of males saved. The survival rates for a women on the ship is around 75% while that for men in around 18-19%.
    • Pclass
      • People say Money Can't Buy Everything. But we can clearly see that Passenegers Of Pclass 1 were given a very high priority while rescue. Even though the the number of Passengers in Pclass 3 were a lot higher, still the number of survival from them is very low, somewhere around 25%.
      • For Pclass 1 %survived is around 63% while for Pclass2 is around 48%. So money and status matters. Such a materialistic world.
    • Age
      • Looking at the CrossTab and the FactorPlot, we can easily infer that survival for Women from Pclass1 is about 95-96%, as only 3 out of 94 Women from Pclass1 died.
      • It is evident that irrespective of Pclass, Women were given first priority while rescue. Even Men from Pclass1 have a very low survival rate.
    • Embarked
    • SibSip

2)Finding any relations or trends considering multiple features.

  • Correlation Between The Features: Heatmap
    • POSITIVE CORRELATION: If an increase in feature A leads to increase in feature B, then they are positively correlated. A value 1 means perfect positive correlation.
    • NEGATIVE CORRELATION: If an increase in feature A leads to decrease in feature B, then they are negatively correlated. A value -1 means perfect negative correlation.

Part2: Feature Engineering and Data Cleaning:

1)Adding any few features.

2)Removing redundant features.

3)Converting features into suitable form for modeling.

  • Age_band

Part3: Predictive Modeling

1)Running Basic Algorithms.

2)Cross Validation.

3)Ensembling.

4)Important Features Extraction.